Search Result

Select

Task scheduling strategy based on data stream classification in Heron

ZHANG Yitian, YU Jiong, LU Liang, LI Ziyang

Journal of Computer Applications 2019, 39 (4): 1106-1116. DOI: 10.11772/j.issn.1001-9081.2018081848

Abstract （462）

PDF （1855KB）（330）

Save

In a new platform for big data stream processing called Heron, the round-robin scheduling algorithm is usually used for task scheduling by default, which does not consider the topology runtime state and the impact of different communication modes among task instances on Heron's performance. To solve this problem, a task scheduling strategy based on Data Stream Classification in Heron (DSC-Heron) was proposed, including data stream classification algorithm, data stream cluster allocation algorithm and data stream classification scheduling algorithm. Firstly, the instance allocation model of Heron was established to clarify the difference in communication overhead among different communication modes of the task instances. Secondly, the data stream was classified according to the real-time data stream size between task instances based on the data stream classification model of Heron. Finally, the packing plan of Heron was constructed by using the interrelated high-frequency data streams as the basic scheduling unit to complete the scheduling to minimize the communication cost by transforming inter-node data streams into intra-node ones as many as possible. After running SentenceWordCount, WordCount and FileWordCount topologies in a Heron cluster environment with 9 nodes, the results show that compared with the Heron default scheduling strategy, DSC-Heron has 8.35%, 7.07% and 6.83% improvements in system complete latency, inter-node communication overhead and system throughput respectively; in the load balancing aspect, the standard deviations of CPU usage and memory usage of the working nodes are decreased by 41.44% and 41.23% respectively. All experimental results show that DSC-Heron can effectively improve the performance of the topologies, and has the most significant optimization effect on FileWordCount topology which is close to the real application scenario.

Reference | Related Articles | Metrics

Select

Dynamic task dispatching strategy for stream processing based on flow network

LI Ziyang, YU Jiong, BIAN Chen, LU Liang, PU Yonglin

Journal of Computer Applications 2018, 38 (9): 2560-2567. DOI: 10.11772/j.issn.1001-9081.2017122910

Abstract （1189）

PDF （1352KB）（416）

Save

Concerning the problem that sharp increase of data input rate leads to the rising of computing latency which influences the real-time of computing in big data stream processing platform, a dynamic dispatching strategy based on flow network was proposed and applied to a data stream processing platform named Apache Flink. Firstly, a Directed Acyclic Graph (DAG) was transformed to a flow network by defining the capacity and flow of every edge and a capacity detection algorithm was used to ascertain the capacity value of every edge. Secondly, a maximum flow algorithm was used to acquire the improved network and the optimization path in order to promote the throughput of cluster when the data input rate is increasing; meanwhile the feasibility of the algorithm was proved by evaluating its time-space complexity. Finally, the influence of an important parameter on the algorithm execution was discussed and recommended parameter values of different types of jobs were obtained by experiments. The experimental results show that the throughput promoting rate of the strategy is higher than 16.12% during the increasing phases of the data input rate in different types of benchmarks compared with the original dispatching strategy of Apache Flink, so the dynamic dispatching strategy efficiently promotes the throughput of cluster under the premise of task latency constraint.

Reference | Related Articles | Metrics

Select

Task scheduling algorithm based on weight in Storm

LU Liang, YU Jiong, BIAN Chen, YING Changtian, SHI Kangli, PU Yonglin

Journal of Computer Applications 2018, 38 (3): 699-706. DOI: 10.11772/j.issn.1001-9081.2017082125

Abstract （559）

PDF （1385KB）（579）

Save

Apache Storm, a typical platform for big data stream computing, uses a round-robin scheduling algorithm as the default scheduler, which does not consider the fact that differences of computational and communication cost are ubiquitous among different tasks and different data streams in one topology. Hence optimization is needed in terms of load balance and communication cost. To solve this problem, a Task Scheduling Algorithm based on Weight in Storm (TSAW-Storm) was proposed. In the algorithm, CPU occupation was taken as the weight of a task in a specific topology, and similarly tuple rate between a pair of tasks was taken as the weight of a data stream. Then tasks were assigned to the most suitable work node gradually by maximizing the gain of weight of data streams via transforming inter-node data streams into intra-node ones as many as possible with load balance ensured in order to reduce network overhead. Experimental results show that TSAW-Storm can reduce latency and inter-node tuple rate by about 30.0% and 32.9% respectively, and standard deviation of CPU load of work nodes is only 25.8% when compared to Storm default scheduling algorithm in WordCount benchmark with 8 work nodes. Additionally, online scheduler is deployed in contrast experiment. Experimental results show that TSAW-Storm can reduce latency, inter-node tuple rate and standard deviation of CPU load by about 7.76%, 11.8% and 5.93% respectively, which needs only a bit of executive overhead compared to online scheduler. Therefore, the proposed algorithm can reduce communication cost as well as improve load balance effectively, which makes a great contribution to the efficient operation of Apache Storm.

Reference | Related Articles | Metrics

Select

Task scheduling strategy based on topology structure in Storm

LIU Su, YU Jiong, LU Liang, LI Ziyang

Journal of Computer Applications 2018, 38 (12): 3481-3489. DOI: 10.11772/j.issn.1001-9081.2018040741

Abstract （837）

PDF （1471KB）（392）

Save

In order to solve the problems of large communication cost and unbalanced load in the default round-robin scheduling strategy of Storm stream computing platform, a Task Scheduling Strategy based on Topology Structure (TS ²) in Storm was proposed. Firstly, the work nodes with sufficient and available Central Processing Unit (CPU) resources were selected and only a process was allocated to each work node to eliminate the communication cost between processes within the nodes and optimize the process deployment. Then, the topology structure was analyzed, the component with the biggest degree in the topology was found and the thread of the component was assigned with the highest priority. Finally, under the condition of the maximum number of threads that a node could carry, the associated tasks were deployed to the same node as far as possible to reduce the communication cost between nodes, improve the load balance of cluster and optimize the thread deployment. The experimental results show that, in terms of system latency, the average optimization rate of TS ² is 16.91% and 5.69% respectively compared with Storm default scheduling strategy and offline scheduling strategy, which effectively improves the real-time performance of system. Additionally, compared with the Storm default scheduling strategy, the communication cost between nodes of TS ² is reduced by 15.75% and its average throughput is improved by 14.21%.

Reference | Related Articles | Metrics

Select

Energy-efficient strategy for threshold control in big data stream computing environment

PU Yonglin, YU Jiong, WANG Yuefei, LU Liang, LIAO Bin, HOU Dongxue

Journal of Computer Applications 2017, 37 (6): 1580-1586. DOI: 10.11772/j.issn.1001-9081.2017.06.1580

Abstract （542）

PDF （1225KB）（483）

Save

In the field of big data real-time analysis and computing, the importance of stream computing is constantly improved while the energy consumption of dealing with data on stream computing platform rises constantly. In order to solve the problems, an Energy-efficient Strategy for Threshold Control (ESTC) was proposed by changing the processing mode of node to data in stream computing. First of all, according to system load difference, the threshold of the work node was determined. Secondly, according to the threshold of the work node, the system data stream was randomly selected to determine the physical voltage of the adjustment system in different data processing situation. Finally, the system power was determined according to the different physical voltage. The experimental results and theoretical analysis show that in stream computing cluster consisting of 20 normal PCs, the system based on ESTC saves about 35.2% more energy than the original system. In addition, the ratio of performance and energy consumption under ESTC is 0.0803 tuple/(s·J), while the original system is 0.0698 tuple/(s·J). Therefore, the proposed ESTC can effectively reduce the energy consumption without affecting the system performance.

Reference | Related Articles | Metrics

Select

Video recommendation algorithm based on clustering and hierarchical model

JIN Liang, YU Jiong, YANG Xingyao, LU Liang, WANG Yuefei, GUO Binglei, Liao Bin

Journal of Computer Applications 2017, 37 (10): 2828-2833. DOI: 10.11772/j.issn.1001-9081.2017.10.2828

Abstract （582）

PDF （1025KB）（669）

Save

Concerning the problem of data sparseness, cold start and low user experience of recommendation system, a video recommendation algorithm based on clustering and hierarchical model was proposed to improve the performance of recommendation system and user experience. Focusing on the user, similar users were obtained by analyzing Affiliation Propagation (AP) cluster, then historical data of online video of similar users was collected and a recommendation set of videos was geberated. Secondly, the user preference degree of a video was calculated and mapped into the tag weight of the video. Finally, a recommendation list of videos was generated by using analytic hierarchy model to calculate the ranking of user preference with videos. The experimental results on MovieLens Latest Dataset and YouTube video review text dataset show that the proposed algorithm has good performance in terms of Root-Mean-Square Error (RMSE) and the recommendation accuracy.

Reference | Related Articles | Metrics

Select

Dynamic data stream load balancing strategy based on load awareness

LI Ziyang, YU Jiong, BIAN Chen, WANG Yuefei, LU Liang

Journal of Computer Applications 2017, 37 (10): 2760-2766. DOI: 10.11772/j.issn.1001-9081.2017.10.2760

Abstract （759）

PDF （1299KB）（853）

Save

Concerning the problem of unbalanced load and incomplete comprehensive evaluation of nodes in big data stream processing platform, a dynamic load balancing strategy based on load awareness algorithm was proposed and applied to a data stream processing platform named Apache Flink. Firstly, the computational delay time of the nodes was obtained by using the depth-first search algorithm for the Directed Acyclic Graph (DAG) and regarded as the basis for evaluating the performance of the nodes, and the load balancing strategy was created. Secondly, the load migration technology for data stream was implemented based on the data block management strategy, and both the global and local load optimization was implemented through feedback. Finally, the feasibility of the algorithm was proved by evaluating its time-space complexity, meanwhile the influence of important parameters on the algorithm execution was discussed. The experimental results show that the proposed algorithm increases the efficiency of the task execution by optimizing the load sharing between nodes, and the task execution time is shortened by 6.51% averagely compared with the traditional load balancing strategy of Apache Flink.

Reference | Related Articles | Metrics

Select

Coordinator selection strategy based on RAMCloud

WANG Yuefei, YU Jiong, LU Liang

Journal of Computer Applications 2016, 36 (9): 2402-2408. DOI: 10.11772/j.issn.1001-9081.2016.09.2402

Abstract （328）

PDF （1102KB）（276）

Save

Focusing on the issue that ZooKeeper cannot meet the requirement of low latency and quick recovery of RAMCloud, a Coordinator Election Strategy (CES) based on RAMCloud was proposed. First of all, according to the network environment of RAMCloud and factors of the coordinator itself, the performance indexes of coordinator were divided into two categories including individual indexes and coordinator indexes, and models for them were built separately. Next, the operation of RAMCloud was divided into error-free running period and data recovery period, their fitness functions were built separately, and then the two fitness functions were merged into a total fitness function according to time ratio. Lastly, on the basis of fitness value of RAMCloud Backup Coordinator (RBC), a new operator was proposed with randomness and the capacity of selecting an ideal target: CES would firstly eliminate poor-performing RBC by screening, as the range of choice was narrowed, CES would select the ultimate RBC from the collection of ideal coordinators by means of roulette. The experimental results showed that compared with other RBCs in the NS2 simulation environment, the coordinator selected by CES decreased latency by 19.35%; compared with ZooKeeper in the RAMCloud environment, the coordinator selected by CES reduced recovery time by 10.02%. In practical application of RAMCloud, the proposed CES can choose the coordinator with better performance, ensure the demand of low latency and quick recovery.

Reference | Related Articles | Metrics

Select

Parallel access strategy for big data objects based on RAMCloud

CHU Zheng, YU Jiong, LU Liang, YING Changtian, BIAN Chen, WANG Yuefei

Journal of Computer Applications 2016, 36 (6): 1526-1532. DOI: 10.11772/j.issn.1001-9081.2016.06.1526

Abstract （550）

PDF （1195KB）（395）

Save

RAMCloud only supports the small object storage which is not larger than 1 MB. When the object which is larger than 1 MB needs to be stored in the RAMCloud cluster, it will be constrained by the object's size. So the big data objects can not be stored in the RAMCloud cluster. In order to resolve the storage limitation problem in RAMCloud, a parallel access strategy for big data objects based on RAMCloud was proposed. Firstly, the big data object was divided into several small data objects within 1 MB. Then the data summary was created in the client. The small data objects which were divided in the client were stored in RAMCloud cluster by the parallel access strategy. On the stage of reading, the data summary was firstly read, and then the small data objects were read in parallel from the RAMCloud cluster according to the data summary. Then the small data objects were merged into the big data object. The experimental results show that, the storage time of the proposed parallel access strategy for big data objects can reach 16 to 18 μs and the reading time can reach 6 to 7 μs without destroying the architecture of RAMCloud cluster. Under the InfiniBand network framework, the speedup of the proposed paralled strategy almost increases linearly, which can make the big data objects access rapidly and efficiently in microsecond level just like small data objects.

Reference | Related Articles | Metrics

Select

Strategy for object index based on RAMCloud

WANG Yuefei, YU Jiong, LU Liang

Journal of Computer Applications 2016, 36 (5): 1222-1227. DOI: 10.11772/j.issn.1001-9081.2016.05.1222

Abstract （368）

PDF （876KB）（387）

Save

In order to solve the problem of low using rate, RAMCloud would change the positions of objects, which would cause the failure for Hash to localize the object, and the low efficiency of data search. On the other hand, since the needed data could not be positioned rapidly in the recovery process of the data, the returned segments from every single backup could not be organized perfectly. Due to such problems, RAMCloud Global Key (RGK) and binary index tree, as solutions, were proposed. RGK can be divided into three parts:positioned on master, on segment, and on object. The first two parts constituted Coordinator Index Key (CIK), which means in the recovery process, Coordinator Index Tree (CIT) could position the master of segments. The last two parts constituted Master Index Key (MIK), and Master Index Tree (MIT) could obtain objects quickly, even though the data was shifted the position in the memory. Compared with the traditional RAMCloud cluster, the time of obtaining objects can obviously reduce when the data throughput is increasing. Also, the idle time of coordinator and recombined time of log are both declining. The experimental results show that the global key with the support of the binary index tree can reduce the time of obtaining objects and recovering.

Reference | Related Articles | Metrics

Select

Energy-efficient strategy of distributed file system based on data block clustering storage

WANG Zhengying, YU Jiong, YING Changtian, LU Liang

Journal of Computer Applications 2015, 35 (2): 378-382. DOI: 10.11772/j.issn.1001-9081.2015.02.0378

Abstract （467）

PDF （766KB）（384）

Save

Concerning the low server utilization and complicated energy management caused by block random placement strategy in distributed file systems, the vector of the visiting feature on data block was built to depict the behavior of the random block accessing. K-means algorithm was adopted to do the clustering calculation according to the calculation result, then the datanodes were divided into multiple regions to store different cluster data blocks. The data blocks were dynamic reconfigured according to the clustering calculation results when the system load is low. The unnecessary datanodes could sleep to reduce the energy consumption. The flexible set of distance parameters between clusters made the strategy be suitable for different scenarios that has different requests for the energy consumption and utilization. Compared with hot-cold zoning strategies, the mathematical analysis and experimental results prove that the proposed method has a higher energy saving efficiency, the energy consumption reduces by 35% to 38%.

Reference | Related Articles | Metrics

Select

Data migration model based on RAMCloud hierarchical storage architecture

GUO Gang, YU Jiong, LU Liang, YING Changtian, YIN Lutong

Journal of Computer Applications 2015, 35 (12): 3392-3397. DOI: 10.11772/j.issn.1001-9081.2015.12.3392

Abstract （466）

PDF （878KB）（352）

Save

In order to achieve the efficient storage and access to the huge amounts of data online, under the hierarchical storage architecture of memory cloud, a model of Migration Model based on Data Significance (MMDS) was proposed. Firstly, the importance of data itself was calculated based on factors of the size of the data itself, the importance of time, the total amount of user access, and so on. Secondly, the potential value of the data was evaluated by adopting users' similarity and the importance ranking of the PageRank algorithm in the recommendation system. The importance of the data was determined by the importance of data itself and its potential value together. Then, data migration mechanism was designed based on the importance of data, The experimental results show that, the proposed model can identify the importance of the data and place the data in a hierarchical way and improved the data access hit rate from the storage system compared with the algorithms of Least Recently Used (LRU), Least Frequently Used (LFU), Migration Strategy based on Data Value (MSDV). The proposed model can alleviate the part pressure of storage and has improved the data access performance.

Reference | Related Articles | Metrics

Select

Video recommendation algorithm fusing comment analysis and latent factor model

YIN Lutong, YU Jiong, LU Liang, YING Changtian, GUO Gang

Journal of Computer Applications 2015, 35 (11): 3247-3251. DOI: 10.11772/j.issn.1001-9081.2015.11.3247

Abstract （435）

PDF （790KB）（564）

Save

Video recommender is still confronted with many challenges such as lack of meta-data of online videos, and also it's difficult to abstract features on multi-media data directly. Therefore an Video Recommendation algorithm Fusing Comment analysis and Latent factor model (VRFCL) was proposed. Starting with video comments, it firstly analyzed the sentiment orientation of user comments on multiple videos, and resulted with some numeric values representing user's attitude towards corresponding video. Then it constructed a virtual rating matrix based on numeric values calculated before, which made up for data sparsity to some extent. Taking diversity and high dimensionality features of online video into consideration, in order to dig deeper about user's latent interest into online videos, it adapted Latent Factor Model (LFM) to categorize online videos. LFM enables us to add latent category feature to the basis of traditional recommendation system which comprised of dual user-item relationship. A series of experiments on YouTube review data were carried to prove that VRFCL algorithm achieves great effectiveness.

Reference | Related Articles | Metrics

Select

Energy-efficient strategy for disks in RAMCloud

LU Liang YU Jiong YING Changtian WANG Zhengying LIU Jiankuang

Journal of Computer Applications 2014, 34 (9): 2518-2522. DOI: 10.11772/j.issn.1001-9081.2014.09.2518

Abstract （168）

PDF （777KB）（356）

Save

The emergence of RAMCloud has improved user experience of Online Data-Intensive (OLDI) applications. However, its energy consumption is higher than traditional cloud data centers. An energy-efficient strategy for disks under this architecture was put forward to solve this problem. Firstly, the fitness function and roulette wheel selection which belong to genetic algorithm were introduced to choose those energy-saving disks to implement persistent data backup; secondly, reasonable buffer size was needed to extend average continuous idle time of disks, so that some of them could be put into standby during their idle time. The simulation experimental results show that the proposed strategy can effectively save energy by about 12.69% in a given RAMCloud system with 50 servers. The buffer size has double impacts on energy-saving effect and data availability, which must be weighed.

Reference | Related Articles | Metrics

Select

Energy-efficient strategy for dynamic management of cloud storage replica based on user visiting characteristic

WANG Zhengying YU Jiong YING Changtian LU Liang BAN Aiqin

Journal of Computer Applications 2014, 34 (8): 2256-2259. DOI: 10.11772/j.issn.1001-9081.2014.08.2256

Abstract （314）

PDF （793KB）（504）

Save

For low server utilization and serious energy consumption waste problems in cloud computing environment, an energy-efficient strategy for dynamic management of cloud storage replica based on user visiting characteristic was put forward. Through transforming the study of the user visiting characteristics into calculating the visiting temperature of Block, DataNode actively applied for sleeping so as to achieve the goal of energy saving according to the global visiting temperature.The dormant application and dormancy verifying algorithm was given in detail, and the strategy concerning how to deal with the visit during DataNode dormancy was described explicitly. The experimental results show that after adopting this strategy, 29%-42% DataNode can sleep, energy consumption reduces by 31%, and server response time is well. The performance analysis show that the proposed strategy can effectively reduce the energy consumption while guaranteeing the data availability.

Reference | Related Articles | Metrics

Select

Optimal storing strategy based on small files in RAMCloud

YING Changtian YU Jiong LU Liang LIU Jiankuang

Journal of Computer Applications 2014, 34 (11): 3104-3108. DOI: 10.11772/j.issn.1001-9081.2014.11.3104

Abstract （282）

PDF （782KB）（563）

Save

RAMCloud stores data using log segment structure. When large amount of small files store in RAMCloud, each small file occupies a whole segment, so it may leads to much fragments inside the segments and low memory utilization. In order to solve the small file problem, a strategy based on file classification was proposed to optimize the storage of small files. Firstly, small files were classified into three categories including structural related, logical related and independent files. Before uploading, merging algorithm and grouping algorithm were used to deal with these files respectively. The experiment demonstrates that compared with non-optimized RAMCloud, the proposed strategy can improve memory utilization.

Reference | Related Articles | Metrics

Select

Design and implementation of MAC protocol for wireless sensor network based on ARM

Yi-duo LIN De-yun GAO Lu-lu LIANG Si-dogn ZHANG

Journal of Computer Applications 2010, 30 (05): 1145-1148.

Abstract （386）

PDF （612KB）（1151）

Save

IEEE 802.15.4 standard has been one of the most suitable standards for the underlying protocol stack of Wireless Sensor Metwork (WSN) because of its low-rate, low power-consumption, and short-distance. An IEEE 802.15.4 Medium Access Control (MAC) layer protocol stack software was designed and implemented on ARM9 Linux platform after studying IEEE 802.15.4 MAC protocol stack. The software was released in the form of Linux character driver eventually. The results indicate that this software is of greater practicality and scalability.